Arabic font recognition based on diacritics features

نویسندگان

  • Mohammed Lutf
  • Xinge You
  • Yiu-ming Cheung
  • C. L. Philip Chen
چکیده

Many methods have been proposed for Arabic font recognition, but none of them has considered the specialty of the Arabic writing system. Most of these methods are either general pattern recognition approaches or application of other methods which have been developed for languages other than Arabic. Therefore, this paper is the first attempt to present an alternative method for Arabic font recognition based on diacritics. It presents the diacritics as the thumb of Arabic fonts which can be used individually to identify and recognize the font type. Diacritics are the marks and strokes which have been added to the original Arabic alphabet. Though they are the smallest regions in the Arabic script, with today technology it is very easy to get a high resolution image with a very low cost. In this kind of images, the diacritics can reveal very useful information about the font type. In this study, two algorithms for diacritics segmentation have been developed, namely flood-fill based and clustering based algorithm. The experiments conducted proved that our approach can achieve an average recognition rate of 98.73% on a typical database that contains 10 of the most popular Arabic fonts. Compared with existing methods, our approach has the minimum computation cost and it can be integrated with OCR systems very easily. Moreover, it could recognize the font type regardless of the amount of the input data since five diacritics, which in most cases can be found in only one word, are enough for font recognition. & 2013 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Features Extraction Method for Arabic Characters Based on Pixel Orientation Technique

This paper presents a features extraction module for isolated handwritten Arabic characters. The collected core features are based on pixels orientations according to Freeman chain code. The input to this module is Arabic character (in its basic-shapes i.e. without diacritics). The features extractor module, fed with a skeleton of an isolated character basic-shape, yields global and local featu...

متن کامل

Diacritics Recognition Based Urdu Nastalique OCR System

Improvements and new developments in the field of Artificial Intelligence have opened new horizons in the advancement of machines that originally have limited intelligence. As compared to human brain, machines have already better computational speed and storage however there is still much room to improve the capability to acquire and process data and draw conclusions from it on its own. Optical...

متن کامل

Optical Character Recognition System for Urdu Words in Nastaliq Font

Optical Character Recognition (OCR) has been an attractive research area for the last three decades and mature OCR systems reporting near to 100% recognition rates are available for many scripts/languages today. Despite these developments, research on recognition of text in many languages is still in its early days, Urdu being one of them. The limited existing literature on Urdu OCR is either l...

متن کامل

CRF-based Diacritisation of Colloquial Arabic for Automatic Speech Recognition

Most of the available resources of colloquial Arabic speech are transcribed without diacritics. Those diacritics provide short vowels and other pronunciation information and by omitting them a considerable amount of ambiguity is introduced. In this paper, we propose the use of an automatic diacritisation method as front-end for training of automatic speech recognition systems of colloquial Arab...

متن کامل

Recognition of Modern Arabic Poems

We propose a machine learning method for recognizing modern Arabic poems based on the common poetic features of modern Arabic poetry. The poetic features include: rhyming, repetition, use of diacritics and punctuations, and text alignment. The method can classify text documents as poem or non-poem documents with a very high accuracy of 99.81%.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition

دوره 47  شماره 

صفحات  -

تاریخ انتشار 2014